General Database Statistics Using Entropy Maximization
نویسندگان
چکیده
We propose a framework in which query sizes can be estimated from arbitrary statistical assertions on the data. In its most general form, a statistical assertion states that the size of the output of a conjunctive query over the data is a given number. A very simple example is a histogram, which makes assertions about the sizes of the output of several range queries. Our model also allows much more complex assertions that include joins and projections. To model such complex statistical assertions we propose to use the Entropy-Maximization (EM) probability distribution. In this model any set of statistics that is consistent has a precise semantics, and every query has an precise size estimate. We show that several classes of statistics can be solved in closed form.
منابع مشابه
Statistical Mechanics of Classical N-Particle System of Galaxies in the Expanding Universe
For the distribution of classical non-interacting particles we use MaxwellBoltzmann’s statistics. However, this statistics is not workable for classical interacting particles (galaxies). We attempt to modify the Maxwell-Boltzmann’s statistics by incorporating gravitational interaction term in it. The number of ways in which N-particles can have pair interaction due to gravitational interaction ...
متن کاملProbabilistic Query Answering Using Views
The paper studies two probabilistic query evaluation problems. The general setting is that we are given a probability distribution on all possible database instances and have to compute the probability of a tuple belonging to the query’s answer. In the deterministic view problem, we are given a set of view instances and are asked to determine the probability of a tuple belonging to a query’s an...
متن کاملDerivation of equilibrium and time - dependent solutions to MIMI 001 IN and MIMI 00 queueing systems using entropy . maximization
Queueing theory has provided the basis for remarkable successes in the performance modeling and analysis of computer systems.6,19,21 Because it is clear that computer systems do not satisfy assumptions made by the stochastic process models that are used, this success has been somewhat puzzling; it appears that queueing theory equations have wider applicability than is suggested by their classic...
متن کاملA Novel Content Based Image Retrieval Model Based on the Most Relevant Features Using Particle Swarm Optimization
Content Based Image Retrieval (CBIR) is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval (CBIR) depends on extracting the most relevant features according to a feature selection technique. The integration of multiple features may cause the curse of dimensionality a...
متن کاملMeasure Selection: Notions of Rationality and Representation Independence
We take another look at the general problem of selecting a preferred probability measure among those that comply with some given constraints. The dominant role that entropy maximization has obtained in this context is questioned by argu ing that the minimum information principle on which it is based could be supplanted by an at least as plausible "likelihood of evidence" prin ciple. We then r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009